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ABSTRACT 



This study analyzed 13 national datasets and evaluated their 
utility for addressing policy questions concerning college faculty 
availability (such as minority hiring, affirmative action, supply and demand 
in specific disciplines) and the overproduction of Ph.D.s. Each dataset was 
evaluated for its utility in modeling faculty availability. Also documented 
were the types of assumptions which need to be addressed in building models 
based on these datasets. Every national survey instrument related to faculty 
was reviewed, along with information about the data element dictionary, the 
value labels for categorical variables, the sample size, the population size, 
error estimates and weighting procedures. Analysis of the datasets suggests 
that, while many important policy questions may be studied with the data, it 
is not currently possible to complete the critical cross- tabulation of gender 
within ethnicity by rank within tenure status by discipline at the 
institution level . It is recommended that the Integrated Postsecondary 
Education Data System expand its survey to include a table on full-time 
instructional, research and service faculty. The table should be broken out 
by gender within ethnicity by rank within tenure status by discipline 
clusters. (Contains 55 references.) (DM) 
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"Developing Benchmarks for Faculty Hiring" 



ABSTRACT 

This paper analyzes 13 national datasets about faculty and evaluates their utility for 
addressing policy questions about faculty availability and the overproduction of Ph.D.s. This 
work was fbnded by an NSF-NCES- AIR Research Fellowship for 1996-97. While it is 
important to review several of the complex econometric approaches and their assumptions about 
Ph D. production, the primary purpose of this research is to document specific datasets which 
may be used for different types of models. A basic pipeline model is presented which 
demonstrates how several 1993 datasets may be used to conduct more complicated analyses. 



Introduction 
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How many minority and female faculty in a specific discipline are available in the 
national pool for a new hire? 

Is a department producing too many Ph.D.s? 

These two campus-level, policy questions demonstrate the need for data on benchmarks 
for faculty hiring. Faculty availability data are at the heart of required affirmative action reports 
for institutions that receive federal funding (Milam, 1995b). While the climate for affirmative 
action has changed dramatically in California and other states, a parallel debate about faculty 
availability is taking place at the national level in response to alarms about the overproduction of 
Ph.D.s (Geiger, 1997; Massy and Goldman, 1995; SRS Professional Society Workshops, 1995a, 
1995b, 1996). 

This paper analyzes the national datasets about faculty and evaluates their utility for 
addressing policy questions about faculty availability. This work was funded by an NSF-NCES- 
AIR Research Fellowship for 1996-97. While it is important to review several of the complex 
econometric approaches and their assumptions about Ph.D. production, the primary purpose of 
this research is to document specific datasets which may be used for different types of models. 

A basic pipeline model is presented which demonstrates how several 1993 datasets may be used 
to conduct more complicated analyses. 

Literature Review 

Discussion of faculty availability is informed by five distinct literature bases: 

(1) Scholarship about affirmative action and the pipeline of women and minority faculty 
in higher education (Bereman and Scott ,1991; Clark and Corcoran, 1986; Cunningham and 
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Hemmeter, 1992; Ethington, 1988; Hallock, 1994; Hanna, 1988; Harvey and Valadez, 1994; Lie 
et al., 1994; Opp and Smith, 1994; Reynolds, 1992; Rodriguez, 1994; Schroeder and Mynatt, 
1993; Smith, 1994; Tack and Patitu, 1992; and Washington and Harvey, 1989). 

(2) Documentation of availability data on women and minorities in higher education 
(Gill, 1992; Milam, 1995a, 1995b; NEBHA, WICHE, and SREB, 1994; Texas Higher Education 
Coordinating Board, 1992; and Vander Maerdt, 1989). 

(3) A labor economics approach to understanding the supply and demand for faculty in 
the disciplines, particularly science and engineering (Atkinson, 1990; Committee on Science, 
Engineering, and Public Policy, 1995; Geiger, 1997; Leslie and Oaxaca, 1990; Massy and 
Goldman, 1995; McGuire and Price, 1989; Science and Engineering Indicators, 1996; SRS 
Professional Society Workshops, 1995a, 1995b, 1996; Syverson, 1997; Syverson and Forster, 
1985). 

(4) Other surveys and research about the nature of higher education faculty, including 
data about rank, tenure status, activity, workload, productivity, program rankings, research, and 
retirement (Tang and Chamberlain, 1997; Lozier and Dooris, 1989, 1991; National Research 
Council, 1995; National Center for Education Statistics, 1996a, 1996b). 

(5) Scholarship about the Ph.D. process, including time to degree, mentoring, and post- 
doctoral programs (Bowen and Rudenstine, 1992; Bowen and SosaLl989; Strieker, 1994). 

Each of these approaches offers different assumptions for consideration in developing 
models of faculty availability. The pipeline scholarship suggests that the entrance of women and 
minorities into the faculty depends on the climate and nature/tier of the graduate institution, 
effective mentoring, and faculty diversity for role models, among other factors. Availability - 
research uses survey data to calculate utilization rates and predict shortages using models based 
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on simple, descriptive statistics. The research of Massy and Goldman (1995) on The Production 
and Utilization of Science and Engineering Doctorates in the United States , the Committee on 
Science, Engineering, and Public Policy's (1995) work Reshaping the Graduate Education nf 
Scientists and Engineers, and NSF's Science and Engineering Indicators - 1996 chapter on 
"Higher Education in Science and Engineering" are examples of econometric approaches which 
make complex assumptions about matriculation rates, faculty rank transitions, retirement rates, 
and departmental activity based on undergraduate enrollment, budgets, and sponsored research 
funding to estimate the long-term equilibrium of supply and demand. Numerous other efforts 
focus on understanding topics such as changing patterns in the faculty ranks, tenure status, and 
the greying of the professoriate. These shifts in demographics can be incorporated into complex 
models that, for example, account for the growth of non-tenure track positions and the increased 
use of part-time faculty. Finally, the literature on the Ph D. process informs models with 
assumptions about the percent of graduates entering academe, the increase in post-doctoral 
appointments, unemployment, underemployment, and the segmentation of tiers and types of 
institutions. 

Several sets of assumptions emerge from the literature which need to be addressed by 
policy makers, if only to state whether they are accounted for and whether a particular faculty 
availability model is therefore limited in its implications. These be grouped by their effect 
on data about graduate students, Ph.D. recipients, post-doctoral appointments, new assistant 
professors, and other faculty. 

Graduate students 

Assumptions need to be made about the effect of the number of bachelor and master'^ 
degrees on doctoral student supply, the relationship of the master's to the doctorate, breakouts by 



gender and ethnicity, age, retention, and completion rates. Example questions: How valuable is 
master's enrollment in predicting supply for Ph.D. programs? What is the impact of funding 
patterns by field and type of institution on completion? Based on undergraduate and master's 
enrollments, how many Ph.D. students will be enrolled in five years? 

Ph.D. recipients 

Complex models require that assumptions be made about time to degree, matriculation 
and graduation rates, gender, ethnicity, age, field of specialization/dissertation topics, the effect 
of different tiers of institutions on marketability, and the percent of recipients entering academe. 
Example questions: How will the percent of students entering academe vary by field, funding, 
and tier of institution? Are top many Ph.D. students being produced to meet the demand of 
academe, government, and industry? 

Post-doctoral appointments 

Assumptions need to be made about the growth of post-doctoral appointments and their 
effect on obtaining positions in academe? Example questions: What percentage of Ph.D. 
recipients by field go into post-docs? Are post-doc appointments taken because tenure track 
positions are unavailable or because the nature of the discipline requires advanced study? 

New assistant professors 

Assumptions need to be made about the number of new tenUr'e track, assistant professor 
slots. This requires complex assumptions about faculty turnover, rank transitions, retirements, 
etc. Massy and Goldman (1995) suggest that the demand for new hires is driven positively by 
undergraduate enrollments in a department and departmental budgets and negatively by the 
amount of sponsored research. Other assumptions need to be made about whether retiring 
faculty are replaced with tenure track positions or part-time and non-tenure track faculty. 



Example questions: Wha, is the effect of declining tenure track appointments and the increased 

use of restricted faculty on Ph D. demand? How many new faculty are hired each year in a 
specific discipline? 

Other faculty 

Assumptions about faculty turnover, rank transitions, retirement planning, and the effect 
Of early retirement policies need to be taken into account. Increased interdisciplinary work, the 
feminization of some disciplines, the dissolution of tenure, collective bargaining, and other 
factors also influence the demand for new faculty hires. Example questions: How has the lifting 
of mandatory retirement and the greying of the tenured professoriate helped push tenure track 
positions into non-.enure track? Is most of the growth of women and minorities a. the non- 
nure track level? Has the use of part-time faculty for undergraduate instruction limited the 
need for full-time faculty altogether? What is the effect of faculty workload policies on the 
number of faculty needed in a department? 

In evaluating different datasets, it is important to acknowledge the implicit assumptions 
and limitations of the data as they impact modeling about the faculty pipeline. 

Methodology 

The primary purpose of this research is to (1) evaluate each gitional dataset for its utility 
in modeling faculty availability and (2) document the types of assumptions need to be addressed 
in building models based on these datasets. An extensive review was performed of the literature 
bases detailed above and of related documentation on NSF, NRC, NCES, and other agencies' 

websites. This review led to the construction of a critical table (or cross-tab or pivot table) , 

which illustrates the lowest level of aggregation necessary for modeling. 



This critical cross-tab includes the faculty variables rank within tenure status by gender 
within ethnicity. The data need to be completed for the lowest taxonomy of a discipline for a 
single institution. (See table #1 below). 



Table #1; Critical cross-tab at lowest level of aggregation 



Institution: 

Carnegie: 

Control: 

Department: 

Discipline: 




Ethnicity 




Male 


Female 


Tenured 






Professor 






Assoc Prof 






Asst. Prof 






Instructor 






Lecturer 






Tenure Track 






Professor 






Assoc Prof 






Asst. Prof 






Instructor 






Lecturer 






Non-Tenure 

Track 






Professor 






Assoc Prof 






Asst. Prof 






Instructor 






Lecturer 












# Post-Docs 






# Ph.D.s 
Awarded 







In the institutional research literature on peer review, the standard categorical variables 
for institutional comparisons are Carnegie classification and control. Some peer comparison 
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In the institutional research literature on peer review, the standard categorical variables 
for institutional comparisons are Carnegie classification and control. Some peer comparison 
models document additional information about location, enrollment size (broken out by 
undergraduate and graduate), part-time enrollment percentage, and full- and part-time faculty 
size. Traditionally, peer selection and comparison models use variables from NCES IPEDS 
reports, the College Board, John Minter Associates, and other sources. The combination of 
Carnegie classification and control is consistent throughout the institutional research literature 
for segmenting institutional types. It is assumed that data will vary with Carnegie classification 
and that Carnegie types provide an adequate proxy for institutional infrastructure. 

Massy and Goldman (1995) expand this schema for their model. They do a factor 
analysis on fifteen variables available in the NSF CASPAR system and group institutions based 
on their factor loadings. Their variables include the number of faculty, sponsored research and 
development dollars, full-time and part-time science and engineering personnel, science and 
engineering post-doctoral fellows, doctoral degrees (broken out by U.S. and foreign), masters 
degrees, bachelors degrees, expenditures for research equipment, and total graduate students in 
science and engineering (full and part-time). 

In addition to the literature and web review, numerous interviews were conducted in 
person, writing, and by telephone with staff from various higher agencies and higher education 
associations about their experience with and research on faculty data. These agencies included 
the National Science Foundation (NSF), the National Center for Education Statistics (NCES), the 
American Association of University Professors (AAUP), the Rand Corporation, Quantum 
Research Corporation (QRC), Pinkerton Computer Corporation, the U.S. Department of 
Education, the Council of Graduate Schools (CGS), the National Research Council (NRC), and 
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the National Academy of Sciences (NAS). Many of these people participated in the NSF- 
sponsored SRS Professional Societies Workshop series that met in 1995 and 1996. The author is 
grateful for the assistance and guidance which these discussions provided during the course of 
the research and for the availability of the proceedings on the NSF SRS web site. 

The following persons were contacted and interviewed about the faculty datasets: Ernst 
Benjamin, AAUP; Sam Bettinger, Pinkerton; Joan Burrelli, NSF; Lawrence Burton, NSF; 
Michael Cohen, NCES; Valerie Martin Conley, NCES; Charles A. Goldman, RAND; Mary 
Golladay, NSF; Theresa Grimes, QRC; Linda Hardy, NSF; Susan Hill, NSF; Steve Hunt, U.S. 
Dept, of Education; Linda Parker, NSF; Carolyn Shettle, NSF; Peter Syverson, CGS; Veerle 
Van Meel, QRC; Jim Voytuk, NAS; Julie Wood, NRC; and Linda Zembler, NCES. 

Datasets 

The critical question in evaluating each dataset is whether it offers the necessary level of 
a 88 re 8 at ion in the cross-tab of rank within tenure status by gender within ethnicity at the 
discipline level by institution. Every national survey instrument related to faculty (as currently 
constructed and evolved over time) was reviewed, along with information about the data element 
dictionary, the value labels for categorical variables, the sample size, the population size, error 
estimates, and weighting procedures. 

Other questions which are important to the evaluation include: How are the data made 
available to researchers? In what format and at what degree of aggregation are they available? 
How easy is it to aggregate, sort, group, and filter the data to produce the cross-tab of interest? 

Are there sufficient numbers in the cells at the lowest level (particularly women and minorities) 
to produce the cross-tab? What disciplinary taxonomy is used? Are the data normalized? Is it 
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possible to merge the data with other datasets? How have the data been analyzed by other 
researchers or agencies and what publications have been produced based on the results? 

(1) NSF-NIH Survey of Graduate Students and Postdoctorates in Science and 

Engineering (GSS) 

This survey has been conducted by the National Science Foundation in cooperation with 
the National Institutes of Health since 1972. 1994 data are the most current available. This 
survey "compiles a factual base for assessing shifts in graduate enrollment and the relationship 
between patterns of financial support for graduate students. It provides the only nationally 
representative data bank on sources of support of graduate science and engineering (S&E) 
students and their enrollment characteristics, and on S&E postdoctoral appointments" (Guide to 
the Data Files, 1997, http://www.nsf. gov/sbe/srs/ gss/dug/ parta.htm). Data are collected by full 
and part-time students by gender within ethnicity and by funding source, and for other 
"nonfaculty doctoral research staff." 

Some critical variables have changed over the years, but the survey offers consistent data 
about enrollment of science and engineering graduate students at the program or departmental 
level. For several years in the 1980's, a sample of institutions was used. Information about 
faculty, including rank, tenure, appointments, and departures were included in the 1983 and 1984 
surveys. According the Quantum Research Corporation staff, "NSF was not pleased with the 
quality of the faculty data that could be collected through the Graduate Student Survey — it 
reflected only departments that had graduate programs; it did not satisfactorily deal with such 
problems as split appointments, etc. As a result the data were never published and were dropped 
from the survey during the next OMB clearance cycle. As a result, the data is not available for 
the public" (Van Meel, 1997). 
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The entire universe of graduate programs in S&E is surveyed. In 1994, data on 1 1,425 
departments at 604 institutions were collected, with a 96% response rate. Numerous tables of 
the data are available on the web, as are public use data files in AASCII format. While no SAS 
programs are publicly available to read the files, the record layout is documented and the same 
data are available in CASPAR by institution and discipline. 

CASPAR (Computer-Aided Science Policy Analysis and Research) is a software tool 
designed developed by the National Science Foundation (NSF) and Quantum Research 
Corporation (QRC) to make data from NSF, NCES, and other surveys available to researchers 
and policy analysts. The CASPAR CD-Rom may be purchased from Quantum or downloaded 
free over the Internet (approximately 600 megabytes). While the CASPAR software fails to take 
advantage of the Windows environment, with practice it becomes a useful tool. When creating 
spreadsheet or SAS card formatted datasets with SAS read programs, limitations are placed on 
the number of observations which can be saved, requiring that multiple files be saved. Data by 
institution may be merged seamlessly within CASPAR by FICE code. Files created with 
CASPAR can include institutional attributes such as FICE code in order to merge files within 
SAS other software. 

The disciplinary taxonomy is unique to the GSS and includes an exhaustive breakout on 
health fields. A lookup table is available on CASPAR which rolls -tlie 79 disciplines in the GSS 
30 of the 49 CASPAR discipline clusters. The CASPAR discipline taxonomy, while thorough, 
discards some important disciplinary distinctions. While data on psychology, economics, 
anthropology, geography, political science, public administration, linguistics, sociology, and 
other social sciences programs are included in the survey, the S&E purpose of the instrument' 
does not gather comparable data on other disciplines. Post-doctoral data for these disciplines and 



for S&E post-docs outside of academe (such as at national labs) must be estimated from the NSF 
Survey of Doctorate Recipients and the NRC Survey of Earned Doctorates. 

For the critical cross-tab table, the GSS can be used to complete the data on gender 
within ethnicity for S&E post-doctorates in academe, but only for 30 of the 49 CASPAR 
discipline clusters. If the CASPAR taxonomy is adopted for the model, crosswalks must be in 
place for every other dataset. 

(2) NRC Survey of Earned Doctorates (SED) 

The SED is collected by the National Research Council's Office of Scientific and 
Engineering Personnel and is available in the NSF Doctorate Records File for all persons 
receiving doctorates in the United States since 1957, with limited data on doctorate recipients 
from 1920-1956. The survey does not include professional degrees such as the M.D. Of 39,754 
new doctorates in 1993, there was a 95% response rate, with the survey usually considered a 
requirement of graduation paperwork. Records for non-respondents are created from 
commencement lists and other sources. 

The three digit taxonomy of disciplines used for the SED continues to evolve and is the 
most exhaustive of any of the surveys reviewed, with 330 distinct specialties. Specialty data are 
collected for each degree earned, the dissertation topic, field of intended postdoctoral study, and 
expected field of work. In addition to complete demographic data, - the SED collects data on time 
to degree, financial support, and post-graduation plans. Aggregate data are available on 
CASPAR for academic years 1965-66 through 1994-95 at the institution level, but only for the 
49 CASPAR discipline clusters. No public use file comparable to the GSS file is available 
because of the need to protect confidentiality, but researchers can obtain permission from NSF to 
access the microdata under controlled conditions. 
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Numerous tables from the data are available on the web and in annual publications from 
NRC and NSF such as Science and Engineering Degrees . Science and Engineering Indicators , 
and Women, Minorities, and Persons with Disabilities in Science and Engineering . The SED 
data on time to degree are an invaluable resource, although the survey does not account for 
periods of stop-out or part-time study. The data on post-doctoral plans are useful for analysis, 
but these are intentions, not reality. NSF staff report a significant discrepancy between intention 
to pursue a post-doc as reported in the SED and actually having done a post-doc as reported in 
the Survey of Doctorate Recipients (SDR), Similarly, the data on whether graduates plan to 
enter academe, government, or the private sector are useful, especially with the exhaustive 
specialty breakout. According to NSF staff, the SDR is a more accurate predictor than the SED 
for estimating what percentage of Ph.D. recipients are likely to complete a post-doc. 

For the critical cross-tab table, the SED provides a breakout of gender within race for 
doctoral recipients by institution at the lowest disciplinary taxonomy. 

(3) NSF Survey of Doctorate Recipients (SDR) 

The SDR collects data about doctoral recipients who receive their doctorates in the 
United States. A longitudinal survey, new doctoral recipients are added each cycle and 
individuals over age 75 are dropped. The sample is drawn from the Doctorate Records File of 
the SED, with a sampling rate of approximately 1 to 12 and fifty thousand individuals surveyed 
in 1993. In 1993, the sample was expanded to include doctoral recipients in the humanities. 

Data from the survey are weighted to the census population estimates. The most recent data 
available are for 1995. 

In addition to demographic data not collected in the SED, the SDR collects data about 
employment, occupation, postdoctoral status, primary and secondary work activities, salary, 
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sector of employment, and years of professional experience. Numerous reports based on the 
SDR are available on the web in HTML, PDF, and Excel formats and are published in print by 
NSF. The data element dictionary is available on the web and includes extensive technical notes 
and cross-tabs of responses to each variable in the 1993 and 1995 files. The survey also collects 
data on rank, tenure status, full-time/part-time status, occupation, and Carnegie classification of 
institution. 

The Science and Engineering Data System (SESTAT) was developed to provide research 
access to the three survey files maintained by NSF - the SDR, the National Survey of Recent 
College Graduates (NSRCG), and the National Survey of College Graduates (NSCG). The 
individual or combined data files may be used for analysis of the engineering and science 
workforce. No public use files or CASPAR aggregate datasets of the SDR are available because 
of the need to protect confidentiality, but researchers can obtain permission from NSF to access 
SESTAT data under controlled conditions and on the web Immediate access to public use 
versions of the NSCG and SCG files is available on the web with a simple online application 
form. 

A problem of the SDR is that information about current employment is collected in major 
occupational codes. These lose the fine level of detail available in the SED. Field of degree is 
used extensively in some reports as if it were comparable to field of employment. NSF staff 
recognize that this assumption is erroneous and that many Ph D s work outside of their field of 
graduate degree. 

The crosswalk to CASPAR and other disciplinary taxonomies is simplistic, given the 
broad nature of the occupation codes. Another problem of the SDR which has been raised by 
NSF staff and in the online proceedings of the SRS Professional Societies Workshop series is 
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that the survey, because it is based on the SED, excludes persons with professional degrees in the 
medical sciences, yet it surveys PhDs working in this area. The data on medical sciences are, 
therefore, incomplete, unless persons have for example received both the Ph D. and the M D 
A caveat of the SDR is that it is a sample, where the GSS and SED are of the entire 
universe of institutions and Ph.D. recipients. The SDR data are weighted to census population 
estimates, using the occupation codes. This weighting may be suspect, not because of the 
sampling techniques, but because the census data on occupations are themselves a sample (the 
long census form) and because census estimates may not be the best predictors of the higher 
education faculty population. The survey is stratified at many levels in two phases: "For the 
1993 SDR, the sample for the new cohort (1992-93 graduates) was selected as an independent 
supplement to the older cohort sample. The new cohort sample was selected using stratified 
simple random sampling, with comparable sampling rates and stratum definitions as those of the 
Phase 2 older cohort sample" (NSF, 1997, http://srsstats.sbe.nsf.gov/TECHINFO.HTML). 

Despite the complex stratification, it is necessary to evaluate the validity of the population 
estimates for postsecondary faculty in comparison to the IPEDS S or SA. 

For the critical cross-tab, the SDR provides breakouts of gender within race and rank 
within tenure for a sample of S&E higher education faculty, but only for the 29 postsecondary 

j. 

occupation codes. An example of the limitations of this taxonomy'triay be seen in the occupation 

codes for the group "Life and Related Sciences," which has only four occupation codes: 

282710 Postsecondary teachers - Agriculture 
282730 Postsecondary teachers - Biological scientists 
282870 Postsecondary teachers - Medical science 
282970 Other postsecondary teachers - Natural sciences 
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A crosswalk must be used between CASPAR and the SDR, losing most of the fine levels 
of aggregation built into the other surveys. The SDR does allow for estimates of post-doctoral 
data not included in the GSS. It is possible to estimate the percentage of Ph D recipients 
entering post-docs by occupation and field of degree. The longitudinal component of the SDR 
collects age-related data useful for modeling faculty retirement and longitudinal data useful for 
modeling rank transitions. Since the SDR is only a sample, it is not appropriate to gather data at 
the institutional level. If SDR data are to be used in the critical cross-tab, then it is necessary to 
aggregate the data up to a combination of Carnegie classification and control. Without FICE 
code data, additional variables for institutional segmentation such as done by Massy and 
Goldman (1995) are not available. 

(4) National Survey of Recent College Graduates (NSRCG) 

This survey gathers data about people who obtained a bachelor's or master's degree in 
science and engineering since 1990. The National Survey of College Graduates (NSCG) gathers 
comparable data on persons who obtained at least a bachelor's degree prior to 1990. Both 
surveys were designed to be similar to the SDR. Approximately 25,785 students in 273 
institutions were surveyed in 1993. There was a 99% school response rate in stage one, in which 
a sample of institutions provided lists of graduates. Data were collected using computer-assisted 
telephone interviewing, with an 86% response rate. Of the institutions included in the sample, 
"196 produced so many of the nation's S&E graduates that they were selected with certainty. 
From the remaining institutions, 79 institutions were selected using systematic, probability- 
proportional-to-size sampling, after sorting the file by ethnic status, region, public/private status, 
and presence of agriculture" (NSF, 1997, http://srsstats.sbe.nsf.gov/TECHINFO.HTML). -r 
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This survey should not be confused with the Recent College Graduates survey conducted 
by NCES. TheNCES survey has been conducted since 1976. According to survey documents on 
the web, in 1993 NCES established a longitudinal survey of graduating college seniors which 
was supposed to replace the RCG Study. Baccalaureate and Beyond is supposed to follow an 
"oversample of graduating seniors from the National Postsecondary Student Aid Study." The 
NCES version of the RCG was designed to "determine how many graduates become eligible or 
qualified to teach for the first time and how many were employed as teachers in the year 
following graduation, by teaching" and "to examine the relationship between courses taken, 
student achievement, and occupational outcomes" (NCES, 1997, 
http://www.ed.gOv/NCES/surveys/rcg.html#overview). 

Like the SDR, the occupation field data related to faculty in the NSRCG are reported for 
29 post- second ary occupations, and then only for persons with S&E degrees. The technical 
notes about the survey explain that "individuals do not always know the precise definitions of 
occupations that are used by experts in the field and may thus select occupational fields that are 
technically incorrect" (Science and Engineering Work Force, 1997, 

http://x.nsf.gov:80/sbe/srs/survey2.htm). According to NSF staff, the use of occupation codes 
was simplified between the 1993 and 1995 surveys. 

For the critical cross-tab, the NSRCG provides gender within' race data for the 29 post- 
secondary occupations by general type of academic institution. Since the survey is designed to 
reach new graduates and age data are collected, the results allow researchers to study the new 
faculty population in institutions which do not require the doctorate. Since it is a sample, 
institutional data are not appropriate and it is necessary to aggregate the cross-tab data to the*- 
general type of educational institution where faculty are employed. Carnegie, control, rank, and 
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tenure data are not available. In addition, the NSRCG is a sample, with results weighted to 
census estimates that may not be adequate for documenting the faculty population in higher 
education. 

(5) National Survey of College Graduates (NSCG) 

This survey gathers data about persons who stated on the long form for the 1990 Census 
that they held a bachelor's degree or above. Approximately 214,643 of the 4,728,000 people 
who completed the Decennial Census Long Form were selected. A self-administered mail 
survey was sent in 1993, with a response rate of 80%. Like the NSRCG, the data on faculty are 
collected for the 29 postsecondary occupations and the weighting is done to census estimates. 
The NSCG differs in that it collects data regardless of whether the degree was in an S&E field. 
The survey sample was stratified used demographic characteristics, highest degree achieved, 
occupation, and gender. 

For the critical cross-tab, the NSCG provides a breakout of gender within race for the 29 
post-secondary occupations by general type of educational institution. The data on age and other 
demographic and work-related variables may be useful in assumptions and models about 
comprehensive, liberal arts, and two-year institutions where the Ph D. is not required. Carnegie, 
control, rank, and tenure data are not available. 

(6) NCES IPEDS survey of Earned Degrees (C) 

The Integrated Postsecondary Education Data System (IPEDS) Completions Survey is an 
annual NCES survey of all accredited two and four-year institutions. No weighting techniques 
are used, since the entire population is surveyed at the school level. Response rates range from 
85% to 96%, with data on non-responding institutions imputed from previous year's data. Out of 
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the universe of 6,948 institutions included in the early release of 1995-96 data, a total of 5,762 
completed the survey for a response rate of 82.9%. 

Data from 1966 through 1994 are available on CASPAR at the institution level, using a 
crosswalk between the 1990 CEP code disciplines reported in the survey to the CASPAR 
taxonomy. The raw data for 1995-96, rolled up to the two digit CIP code level, are also available 
for downloading from the NCES IPEDS web site. The 1994 data are available on the IPEDS 
CD-Rom. Aggregate degree data are offered by the IPEDS Interactive Database Search site 
(most current 1993). Data are also available through the NCES National Data Resource Center, 
which was established to "enable state education personnel, education researchers, and others to 
obtain special statistical tabulations and analyses of data sets maintained by NCES" (NCES, 

1997, h ttp://www.ed.gov/ pubs/ncesprograms/elementarv/others/ndrc..htmlY The Pinkerton 
Computer Corporation is contracted to provide data services. NDRC has files for IPEDS, the 
National Postsecondary Student Aid Study, and the National Study of Postsecondary Faculty. 
NDRC is the logical source for completions data at the six digit CIP code level by institution. 

For the critical cross-tab, the completions survey documents the number of doctoral 
graduates by gender within race by CIP code. The data on masters degrees may be useful for 
some models of doctoral enrollment demand and for predicting faculty supply for two-year 
colleges which do not require the Ph.D. .-f/' 

(7) NCES IPEDS survey of Faculty Salaries (SA) 

The IPEDS Faculty Salary Survey collects data on full-time, instructional faculty, with 
breakouts by gender within rank within tenure, broken out by contract length for each institution. 
Data from 1971 through 1995 are available in CASPAR, though the calculation of the average, 
all-ranks salary differs from that of AAUP. The data are also available for download at the 
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NCES IPEDS web site (1996-97 most current), on the IPEDS CD-Rom (1994 most current), and 
in the Interactive IPEDS Database search site (1993). 

Basic institutional identifiers such as Carnegie classification and control collected in the 
IPEDS Institutional Characteristics survey are also listed in the datasets. There is no disciplinary 
breakout. 8,868 institutions were included in the universe in 1995, with an 85.5% response rate 
It is important to note that the SA includes only full-time instructional faculty. It does 
not include faculty whose duties are 50% or more in research, service, or administration. For 
this reason, the SA is the best estimate of the total full-time faculty teaching population. 

Historical SA data are useful for tracking the growth of non-tenure track positions by type of 
institution. The SA is comparable to the AAUP faculty survey in the collection of salary and 
benefits expenditures and can be substituted for AAUP in the calculation of average salaries by 
rank, which converts twelve month contracts to nine month. 

For the critical cross-tab, the SA provides aggregate data on gender within rank within 
tenure at the institutional level. The IPEDS SA and S may be used as population estimates of 
faculty totals by Carnegie classification and control and can serve as a validity check for other 
estimates of the total, full-time, instructional faculty population. 

(8) NCES IPEDS Fall Staff survey (S) and the EEO-6 survey 
In 1993, the IPEDS Fall Staff survey replaced the EEO-6 surVey administered by the 
Equal Employment Opportunity Commission. Prior to this, both surveys collected data on 
higher education full- and part-time faculty and staff biennially in odd-numbered years. The 
1993 IPEDS S included all institutions covered by the combination of both surveys, with 8,861 
postsecondary schools, including 3,670 institutions of higher education with accreditation 
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recognized by the U.S. Department of Education. The response rate for 1993 was 87%, with 
data imputed for missing schools based on previous submissions. 

IPEDS data are available for download at the web site (1995 most current), on the EPEDS 
CD-Rom (1993), and at the Interactive EPEDS Database search site (1993). As in the EPEDS 
SA, Carnegie classification and control are collected as institutional identifiers and there is no 
disciplinary breakout. 

The bulk of the EPEDS S is devoted to data on the broad occupational categories 
developed by the EEOC for affirmative action reporting, with breakouts by gender within 
ethnicity for specified salary ranges for each category. Data are collected on part-time 
employees, including part-time faculty (with teaching, research, and service combined). A table 
of data similar to the EPEDS SA is collected, with full-time faculty data on rank within tenure, 
but expanded to gender within ethnicity. The survey combines all teaching, research, and service 
faculty and offers the opportunity to calculate the number of research and service faculty by 
subtracting the number of instructional faculty reported in the S A. Another table of information 
about new hires is collected, broken out for full-time faculty by gender and ethnicity. 
Unfortunately, this is not further broken out by tenure status or rank. These data on hiring reflect 
the only national data source on the number of new faculty hired by specific institutions. 

For the critical cross-tab, the EPEDS S provides aggregate, data by gender within ethnicity 

t .. 

by rank within tenure, but only at the institution level. The survey provides the best estimate of 
the total full- and part-time faculty population. The data on new hires is useful in predicting an 
annual growth rate by institution and therefore of the stratum of Carnegie classification and 
control. 
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(9) CUPA National Faculty Salary Survey 

The annual National Faculty Salary Survey by Discipline and Rank in Public Four- Year 
Colleges and Universities has been administered by the College and University Personnel 
Association (CUPA) since 1982. The survey collects data on the number and salaries of faculty 
by clusters of CEP code-level disciplines. A total of 357 institutions completed the survey for 
1996-97, many of them members of the American Association of State Colleges and Universities 
(AASCU). The data are broken out by gender and rank within discipline, with minimum, 
maximum, and average salaries for full-time, instructional faculty. Only data that fit into the 
survey’s unique combinations of CEP code taxonomy are collected, so the results may not be used 
as an estimate of the total faculty population by discipline at the participating institutions. 

The survey is administered by Richard D. Howe at Appalachian State University and 
analyzed by the University of Oklahoma institutional research staff, which also administers the 
Oklahoma Faculty Salary Survey. A data book on the CUPA survey is published each year and 
participating institutions may purchase customized studies with special data tabulations for $250, 
without institutional identifiers. There is no collection of data by ethnicity or tenure status. Data 
are collected for new assistant professors as a subset of the assistant professor data. These are 
potentially useful for estimating the number of new assistant professor hires by discipline and are 
used to document benchmarks of the salary marketplace. ,;A 

For the critical cross-tab, the CUPA survey provides data by gender within rank for 
clusters of CEP code disciplines. The sample size does not permit extrapolation for non-AASCU 
institutions. The data on new assistant professors is useful in making assumptions about the 
number of new hires by discipline, though these are not qualified by whether they are tenure,- ' 
track or restricted positions. 
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(10) Oklahoma Faculty Salary Survey 

The annual Oklahoma Faculty Salary Survey is comparable to the CUPA survey, but is 
expanded to all CIP codes used by participating institutions and includes data on ethnicity. Like 
the CUPA, the Oklahoma survey is limited to a relatively small number (84 in 1996-97) of 
AASCU and NASULGC (National Association of State Universities and Land Grant Colleges) 
institutions and collects data only on full-time, instructional faculty. A subset of 30 institutions 
is analyzed by the University of Alabama for the Southern University Group (SUG). 

Data are collected for gender, ethnicity, and rank, with a breakout for new assistant 
professors identical to that of the CUPA survey. In order to be listed in the print reports which 
aggregate data by discipline, a. CIP code must be used by more than a few institutions. If there is 
no match, the data are rolled into the other (99) version of the four digit CIP code and then to the 
(01) version of the CIP. 

Electronic data are provided to participating institutions, allowing institutional research 
offices to weight the data to match their own profile of disciplines. Reports on faculty salaries 
by CIP code are published regularly by the University of Oklahoma's institutional research 
office. Depending upon the office's workload, the additional data on ethnicity are reported in 
occasional years. Oklahoma will produce special data tabulations without institutional identifiers 
for a fee. .:h 

For the critical cross-tab, the Oklahoma survey provides data on gender within ethnicity 
by rank at the CIP code discipline level, but only for a relatively small, somewhat homogeneous 
sample of institutions. 
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(11) National Survey of Postsecondary Faculty (NSOPF) 

The NSOPF was conducted in 1988 and in 1993 by NCES, with support from NSF and 
the National Endowment for the Humanities. In 1993, institutional and faculty versions of the 
survey were used. A department chair survey was administered in 1988. The 1993 NSOPF was 
administered by the National Opinion Research Center (NORC) at the University of Chicago. 

NSOPF is the primary survey of faculty activities, demographics, and attitudes. A two 
stage sampling procedure was used for the faculty questionnaire. First, 974 institutions were 
contacted, of which 817 agreed to participate. These institutions provided lists of faculty by 
discipline. Disciplinary data were recorded in order to over-sample four NEH disciplines. The 
sampling rate was also increased for full-time women and minorities. From the lists, samples 
with a measure of size of 41.5 faculty (41 or 42) per institution were developed, stratified by 
Carnegie classification and control. Most public and private research universities and most 
public doctoral universities were included (with certainty) in the sample. A total of 25,780 
surveys were completed for a response rate of 86.6%. 

In analyzing the NSOPF data for 1988 and 1993, anomalies were detected in the number 
of part-time and health science faculty. The initial Data Analysis System (DAS) and analyses 
were revised and re-released after it was determined that the survey was not adequately 
administered to medical school faculty and that the weights of partrftme faculty were incorrect 
due to problems in the institutional lists. The part-time issue has been corrected, but NSOPF still 
under-reports health science faculty. While data on discipline were collected with 149 possible 
fields, the sample was not stratified by discipline. For this reason and because of the problems 
with health sciences, the data should not be interpreted by discipline. Even analysis by clusters 
of disciplines is suspect. 
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The NSOPF data are available for study in a data analysis system (DAS) on CD-Rom 
from NCES. The CD also includes data on most other non-IPEDS surveys administered through 
NCES. The Window-based software allows filtered, two dimensional cross-tabs. Two versions 
of the software are provided, one for regular tables and one that produces correlation matrices for 
further analysis in SAS or SPSS. The software produces a tab delimited text file with 
information on weights, Ns, and standard errors. Microdata are available for controlled use 
under licensing agreements with NCES. In addition, the National Data Resource Center is able 
to produce data tables from the NSOPF if the required analysis cannot be easily obtained with 
the DAS. * 

Several reports of the NSOPF data are now available, including Faculty and Instructional 
Staff: Who Are They and What Do They Do? and Institutional Policies and Practices Regarding 
Faculty in Higher Education . A Methodology Report is forthcoming in the Summer of 1997. 

The report Women and Minority Faculty in Science and Engineering is being prepared by staff 
of the Pelavin Research Institute (1997) and will also be distributed soon. This is the only report 
of its kind focused on science and engineering faculty in higher education. Other analyses 
conducted by NSF on science and engineering faculty are limited in their scope to the SESTAT 
occupation codes. Basic information about the survey is also available at the NCES web site. 

It is important to note that the definition of faculty used fonthe NSOPF differs from that 
of the IPEDS S and SA. The institutional lists included full-time, part-time, permanent, 
temporary, instructional faculty and staff, along with non-instructional faculty. This is an 
important source of information on part-time and temporary staff. The reader must be careful, 
though, in interpreting tables of NSOPF data to ensure that the correct faculty definition is used. 
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In weighting the sample to the population, NORC first weighted the respondents by 
institutional type to the lists from institutions (approximately 500,000 faculty names). These 
data were then weighted again by institution to 17 strata of Carnegie classification and control 



and the total faculty population as documented in the IPEDS S. The number of strata is uneven 
because there are no public, religious institutions. 

For the critical cross-tab, the faculty questionnaire provides data on gender within 
ethnicity by rank within tenure. Unfortunately, the data need to be aggregated by Carnegie and 
control and the data on disciplines are not usable because they were not stratified in the sample. 

The institutional survey contains additional data about instructional and non-instructional 
faculty hires, retirements, and downsizing. Totals of instructional and non-instructional tenured 
and tenure track faculty are collected for Fall 1991 and Fall 1992 by institution. The number of 
faculty considered for and granted tenure are also documented. These types of data are 
extremely valuable in making assumptions about faculty mobility 

(12) UCLA HERI Faculty Survey 

The Faculty Survey administered by the Higher Education Research Institute (HERI) of 
the University of California - Los Angeles is very similar to the NSOPF in its focus on faculty 
demographics, activities, and attitudes. The survey was most recently administered in 1995, with 
384 institutions and 33,986 respondents, for an overall response rate'of 42%. Faculty, for the 
purpose of the survey, are defined broadly. Depending upon whom institutions chose to sample, 
the survey includes employees who teach undergraduates, full-time administrators, full-time 
researchers, and faculty who teach only at the graduate level. 

This survey is an invitational sample and HERI charges institutions a fee based on the 
number of faculty surveyed, similar to the administration of the UCLA CIRP Freshmen survey. 
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The publication The American College Teacher: National Norms for the 1995-96 HERI Famltv 
Survey is sent to participating institutions and is available for purchase from HERI. HERI sends 
a standard set of cross-tabs of the data to institutions and will prepare additional analysis of data 
for a fee. 

"National Norms" were developed based on the portion of respondents who code 
themselves as undergraduate teaching faculty. The norms included all institutions which 
surveyed a minimum percentage of their faculty population, as determined from analysis of 
IPEDS reports. The list of participating institutions was examined "using a 23 cell stratification 
based on institutional type, selectivity, and control" (Sax et al, 1996, p. 1). The sample was 
supplemented with 21 randomly selected institutions for the cells with low Ns and the 
participation of 22 additional institutions was supplemented with funding from the Corporation 
for National Service. 

For the critical cross-tab, the HERI faculty questionnaire provides data on gender within 
ethnicity by rank within tenure status by Carnegie classification and control. Data on discipline 
are not collected. Also, the instrument does not permit coding as non-tenure track, only if and 
when tenure was awarded. 

(13) Doctoral Program Rankings - 1995 

The National Research Council (NR.C) collected data on faculty as part of its doctoral 
program rankings project in 1982 and 1995. The 1980 data were used by Massy and Goldman 
(1995) to calculate the base number of faculty by discipline. For the 1995 study, the NR.C 
gathered data on 41 fields selected because of three factors: the number of Ph.D.s produced 
nationally, the number of programs training Ph.D.s within a particular field, and the average- 
number of Ph.D.s produced per program. 
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Based on reports from Institutional Coordinators (ICs) who provided information about 
their programs, 3,634 research-doctorate programs at 274 U.S. universities were targeted. Of 
these, 105 were private and 169 public. "This sample represents about 35 percent more 
programs than the number included in the 1982 study. Taken together, these programs involved 
about 78,000 faculty members and trained about 90 percent of the total number of Ph D s 
produced in these fields between 1986 and 1992. Of the 228 institutions in the 1982 study, 214 
participated in this one and many added more programs for review" 

(National Research Council, 1997, http://www.nap.edu/readingroom/books/researchdoc/ 
summary.html). 

Data on specific faculty were taken from "various sources of information," including the 
Doctorate Records File of SED data. Using the combination of IC reports and faculty survey 
instruments, data were gathered about program ratings, Ph.D. recipients, women and minority 
enrollment and degree patterns, and the number of faculty. 

In addition to the boo k Research-Doctorate Programs in the United States: Continuity and 
Change , various documents and Excel spreadsheets are available on the web. A CD-Rom of the 
data will be available in a few weeks, according to Jim Voytuk at the National Academy of 
Sciences. Voytuk believes that since there was a "Greater number of programs and fields for 
the 1995 study than was used in 1982" it is possible to "use the data A 'to project overproduction" 
(Voytuk, 1997). 

For the critical cross-tab, it is possible to document a portion of the number of higher 
education faculty, but only for the 41 targeted programs and only for the 214 institutions which 
participated. Information on gender, ethnicity, and tenure status is not collected. 
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Results 



The analysis of the datasets suggests that, while many important policy questions may be 
studied with the data, it is not possible to complete the critical cross-tabulation of gender within 
ethnicity by rank within tenure status by discipline at the institution level. With every data 
source, there are inherent limitations in the instrument, the sampling, or the coding. 

In order to merge the different datasets, it is necessary to adopt the CASPAR disciplinary 
taxonomy. If the Survey of Doctorate Recipients (SDR) data are to be used, and these are the 
best source of data on non-S&E post-docs and the percent of doctorate recipients entering 
academe, the 49 disciplines in the CASPAR taxonomy must be rolled up to the 29 postsecondary 
census occupations. This is unfortunate, given the fine level of disciplinary detail built into some 
of the surveys, particularly the Survey of Earned Doctorates (SED). 

No single dataset is capable of addressing the myriad of policy questions about faculty 
availability at the required level of aggregation. On the surface, the NSOPF seems closest. 
However, the lack of stratification by discipline and the problems in sampling health science 
faculty are significant drawbacks to the survey's utility for this purpose. Many one- and two- 
dimensional tables may be created with the thirteen datasets which are of great importance to 
policy makers. Complex models of faculty availability for affirmative action and for 
understanding Ph.D. production require weighting and assumptions based on different datasets 
It is possible to create a model of faculty availability which incorporates the best of what 
scholars can learn from several of the surveys. The components of such a model would include: 
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Ph.D. Recipients 

The universe of doctoral graduates is available by gender within ethnicity at the 



institution level in the SED and the IPEDS C. The various SED and IPEDS C taxonomies are 
already crosswalked to CASPAR. These need to be normalized to the 29 postsecondary 
occupation census codes used in SESTAT. 

Using the SED, the percent of Ph.D. recipients entering academe may be estimated by 
gender within race, occupation code, and by Carnegie and control. The validity of the estimates 
needs to be examined against data from the SED. 

The percent entering S&E post-doctoral programs in academe may be estimated by 
gender within ethnicity and by discipline using the GSS. The discipline data must be rolled up to 
the CASPAR taxonomy and then to the 29 occupation codes. The SDR may be used to estimate 
the percentage of graduates entering post-docs outside of academe in national labs and industry. 
The SDR and the NSOPF may also be used to estimate the percentage of graduates entering post- 
docs in non-S&E disciplines. 

Faculty 

The SDR, NSRCG, and NSCG datasets may be used to generate estimates of faculty 
numbers by gender within ethnicity by occupation code and type of- institution (control and 
Carnegie for the SDR, type of educational institution for the other two SESTAT datasets). 

These need to be evaluated against data obtained with the Oklahoma survey for public research 
and doctoral institution percentages. 

The SESTAT data are weighted against census estimates for the entire U.S. population. ' 
The weights need to be recalculated using the IPEDS S to estimate the total number of full-time 
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and part-time, instructional and research faculty positions in higher education by occupation, 
broken out by general type of educational institution. 

Rates for faculty transitions should be calculated by occupation code, gender, ethnicity, 
rank, tenure, and institutional type from the SDR. These also need to be calculated by rank, 
tenure, and institutional type from the NSOPF faculty questionnaire. Rates for faculty retirement 
and attrition may be estimated by occupation code and institutional type from the SDR and by 
institutional type from the NSOPF institutional questionnaire. Rates for new faculty hires may 
be calculated by institutional type from the IPEDS S, the NSOPF institutional questionnaire, and 
perhaps the CUPA and Oklahoma surveys. Of these rates, only those taken from the SDR and 
the IPEDS S allow breakout by gender within ethnicity. 

The Massy and Goldman (1995) study predicts faculty mobility and new hires based on 
complex econometric models about undergraduate enrollment (faculty productivity), research 
expenditures, departmental budgets, and differences by institutional segmentation. It is possible 
to use the results of rate of growth in faculty positions by each of the ten disciplines which are 
calculated in the study to project faculty demand. 

Conclusions 

It is clear that a massive research effort is underway at the national level to collect data 
about faculty. These thirteen datasets address many policy issues. Unfortunately, there are 
problems in using each of them for complex modeling for faculty availability and Ph.D. 
production studies. Should one of the data elements in the cross-tabs should be eliminated? 

What is the appropriate level of aggregation? Each level represents a necessary assumption, 
albeit one of many. This analysis shows that the disciplinary crosswalk must be rolled up to the 



29 postsecondary census occupation codes in order to use the SDR. Rank and tenure data are 
critical to calculating projections for new assistant professor and non-tenure track hires. If the 
gender or ethnicity variables are discarded, the utility of the model for affirmative action and 
faculty pipeline research is greatly diminished. Since institutional data are not available across 
each of the key datasets, the combination of Carnegie classification and control is essential to 
differentiating segments of institutions. 

If scholars, researchers, and practitioners agree that the critical cross-tab detailed in this 
paper is essential to policy analyses about affirmative action and Ph D. production, then an 
additional survey instrument should be proposed to NSF and/or NCES. As part of the IPEDS 
Technical Panel which met in May, 1997, the author suggested that the IPEDS S survey be 
expanded to include a table on full-time instructional, research, and service faculty. The table 
would be broken out by gender within ethnicity by rank within tenure status by discipline 
clusters. Conceivably, the table could aggregate all disciplines into ten or simply collect data on 
different disciplines in different years. At ten disciplines a year, over a five year cycle, data 
could be collected on all 49 CASPAR discipline clusters. Mary Golladay and NSF staff have 
worked for many years to get interest in a faculty survey at the S&E department level 
comparable to the GSS. It would be unfortunate, though, if this were limited to only S&E 
disciplines. 

The Professional Societies Workshop series sponsored by NSF's Division of Science 
Resources Studies has brought agency and association personnel together in frank discussions of 
the utility and limitations of these instruments and datasets and the implications for policy 
research. Several disciplinary associations conduct their own surveys of Ph.D. recipients and- 
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faculty and are perhaps the best prepared to understand discipline-specific tensions of supply and 
demand. These insights need to be incorporated into the assumptions of these complex models. 

Several additional research projects are currently underway. According to Ernst 
Benjamin of AAUP, the Sloan Foundation is funding an effort by Jack H. Schuster of the 
Claremont Graduate School and others to examine and develop the kind of complex econometric 
models studied by Massy and Goldman (1995). The Mellon Foundation is funding a 
longitudinal study being conducted by the University of California at Berkeley by Maresi Nerad 
and Joe Cemy entitled "The Ph.D. Ten Years Later Study." According to Peter Syverson of 
CGS, the ojectives of the longitudinal study are to (1) determine career paths; (2) understand 
reasons for choosing career paths; and (3) understand employment patterns of women and 
minority doctoral recipients. 

Except for the longitudinal study, each of these models shares the limitations of the 
existing datasets and must by nature make many kinds of assumptions. The Massy and Goldman 
study, interesting as it is for its explanation of departmental activity in predicting faculty 
demand, is flawed by its use of the 1980 NRC doctoral rankings data. A model is presented in 
this paper which uses the best of the national datasets, but it makes many levels of assumptions 
which are subject to error. No model for understanding faculty availability or supply and 
demand will be credible unless an effort is made to survey the entire' faculty population at the 
discipline level. No amount of careful sampling techniques, complex stratification, and 
weighting methodologies will substitute. 
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